Data Structures for Dynamic Query Browsing of EOS Data Directories


Julie M. Pointek 
Ben Shneiderman

Human-Computer Interaction Laboratory
A.V. Williams Building
University of Maryland
College Park, MD  20742
e-mail: {pointek,ben}@cs.umd.edu
phone: 301 405 2725 ; fax: 301 405 6707
Date: July 12, 1995


INTRODUCTION

NASA's Earth Observing System Data and Information System (EOSDIS) seeks to
manage large volumes of complex data from satellite and field management
programs, allowing users to query and search for relevant subsets of data.
The challenge in developing an appropriate user interface for this system
lies in meaningfully representing these data to a diverse user community
in a compact, easy-to-use manner.

The dynamic query approach lends itself well to this problem. Dynamic queries
involve range searches on multi-key data sets. Users directly manipulate
slider or button widgets to specify ranges for each key value. Within a
dynamic query interface, changes to query fields are reflected in a visual
display of the data almost instantaneously, usually within 100 ms.

To reduce lengthy network delays due to the transfer of enormous volumes of
data, we plan to implement a two-level query architecture. The initial level
will allow the user to narrow the query selection on a coarser scope, while
the second refined query level will allow users to locally view and manipulate
the previously selected data subset.

Because of the need for immediate feedback and display, efficient query
computation must be supported using suitable data structures and search
algorithms. Previous work in dynamic queries has resulted in favorable
performance for smaller data sets (less than 5000) using simple data
structures. However, these data structures are poorly suited for large,
complex data sets such as those present in EOSDIS, prompting the need for
development of new multidimensional data structures and algorithms.


MULTIDIMENSIONAL DATA STRUCTURES:  AN EXPERIMENT

Previous work in performance comparisons of multidimensional data structures
was conducted by Jain and Shneiderman (1994) who evaluated the multilist,
grid array, k-d tree, and quad tree data structures for dynamic queries
with respect to storage and search costs. According to their results, the
multilist performs well for uniformly distributed data, but its high storage
overhead only lends itself well to small data sets. The grid array is a good
choice for large, uniform data sets. The two tree structures are better-suited
for skewed data distributions of more than 1,000 points. The k-d tree requires
less memory, but has a marginally higher search overhead than the quad tree.
However, when the data distribution is unknown, the k-d tree appears to be the
superior choice.


PROPOSED DATA STRUCTURES

Due to the diverse nature of the data, we can conjecture that the distribution
of EOS data will not be uniform. Coupled with the large volume of data,
and upon further examination of EOS data characteristics and requirements,
the k-d tree appears to be a promising data structure for a dynamic query
interface.

The k-d tree, initially proposed by Bentley (1975), is a type of binary search
tree used to handle the case of single records having multiple keys. The
general k-d tree structure will be modified and customized to fit the EOSDIS
data structure.

Our current work is the implementation and evaluation of query performance and
storage overhead of the k-d tree and other relevant data structures in
conjunction with EOSDIS data. Additionally, investigations are being made into
possible structure and search optimizations to facilitate dynamic queries.


CONCLUSION

A dynamic query interface has been proposed to manage data browsing and
retrieval among the large volumes of EOS data. In order to maintain
ease-of-use and sufficient power, dynamic queries require smooth and rapid
performance, necessitating the development of new and efficient data
structures capable of handling complex multidimensional data.